Esther Galbrun and Pauli Miettinen Redescription Mining
نویسندگان
چکیده
In scientific investigations, data oftentimes differ in nature; for instance, they might originate from distinct sources or be cast over separate terminologies. In order to gain insight into the phenomenon of interest, an intuitive first task is to identify the correspondences that exist between these different aspects. This is the motivating principle behind redescription mining, a data analysis task that aims at finding distinct common characterizations of the same objects. In this chapter, we provide the basic definitions of redescription mining, including the data model, query languages, similarity measures, p-value calculations, and methods for pruning redundant redescriptions. We will also briefly cover related data analysis methods and provide a short history of redescription mining research. What is redescription mining? The answer to the eponymous question of this chapter involves some amount of theoretical framework-building: definitions that are used to make other definitions that in turn are used to define yet new concepts that—hopefully—finally yield a coherent and complete definition of redescription mining. That, at least, is the mathematical way to answer the question. A more holistic approach would be to consider how redescription mining relates to other data analysis methods, defining it not by what it is, but through its similarities and dissimilarities. Or, perhaps one could define redescription mining by looking at its evolution, asking how it started and how it became what it is. These are three valid approaches for defining redescription mining, and we will examine them in this chapter. First, though, let us answer the titular question of this chapter with an ostensive definition of redescription mining. 1.1 First Examples of Redescriptions Consider an ecologist who wants to understand what kind of bioclimatic environment different mammal species require. She knows the regions the different mammal species inhabit, and she knows the bioclimatic conditions of those places, such as
منابع مشابه
A Case of Visual and Interactive Data Analysis: Geospatial Redescription Mining
We present a method for visual and interactive geospatial redescription mining. The goal of geospatial redescription mining is to characterize geospatial areas using two different descriptions, such as their bioclimatic features and fauna. Indeed, one application of geospatial redescription mining is finding bioclimatic niches, i.e. explaining the distribution of species using their bioclimatic...
متن کاملSiren: An Interactive Tool for Mining and Visualizing Geospatial Redescriptions [Demo]
We present Siren, an interactive tool for mining and visualizing geospatial redescriptions. Redescription mining is a powerful data analysis tool that aims at finding alternative descriptions of the same entities. For example, in biology, an important task is to identify the bioclimatic constraints that allow some species to survive, that is, to describe geographical regions in terms of both th...
متن کاملFrom Black and White to Full Colour: Extending Redescription Mining Outside the Boolean World
Redescription mining is a powerful data analysis tool that is used to find multiple descriptions of the same entities. Consider geographical regions as an example. They can be characterized by the fauna that inhabits them on one hand and by their meteorological conditions on the other hand. Finding such redescriptors, a task known as niche-finding, is of much importance in biology. But current ...
متن کاملA Mining Redescriptions with Siren
In many areas of science, scientists need to find distinct common characterizations of the same objects and, vice versa, to identify sets of objects that admit multiple shared descriptions. For example, in biology, an important task is to identify the bioclimatic constraints that allow some species to survive, that is, to describe geographical regions both in terms of the fauna that inhabits th...
متن کاملTowards Finding Relational Redescriptions
This paper introduces relational redescription mining, that is, the task of finding two structurally different patterns that describe nearly the same set of object tuples in a relational dataset. By extending redescription mining beyond propositional and real-valued attributes, it provides a powerful tool to match different relational descriptions of the same concept. As a first step towards so...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018